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Abstract 

o 

y—^ Regenerating codes are a class of recently developed codes for distributed storage that, like Reed-Solomon 

<^^ codes, permit data recovery from any arbitrary k of n nodes. However regenerating codes possess in addition, the 

Cn ability to repair a failed node by connecting to any arbitrary d nodes and downloading an amount of data that is 

pi I typically far less than the size of the data file. This amount of download is termed the repair bandwidth. Minimum 

CJ storage regenerating (MSR) codes are a subclass of regenerating codes that require the least amount of network 
storage; every such code is a maximum distance separable (MDS) code. Further, when a replacement node stores 

CO data identical to that in the failed node, the repair is termed as exact. 

'""' The four principal results of the paper are (a) the explicit construction of a class of MDS codes for d = 

^^ n — 1 > 2fc — 1 termed the MISER code, that achieves the cut-set bound on the repair bandwidth for the exact- 

Li^ repair of systematic nodes, (b) proof of the necessity of interference alignment in exact-repair MSR codes, (c) a 

HH proof showing the impossibility of constructing linear, exact-repair MSR codes for d < 2fc — 3 in the absence of 

C/3 symbol extension, and (d) the construction, also explicit, of MSR codes for d = fc + 1. Interference alignment (lA) 

I ^1 is a theme that runs throughout the paper: the MISER code is built on the principles of lA and lA is also a crucial 
component to the non-existence proof for d < 2fc — 3. To the best of our knowledge, the constructions presented 

CO in this paper are the first, explicit constructions of regenerating codes that achieve the cut-set bound. 
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I. Introduction 



In a distributed storage system, information pertaining to a data file is dispersed across nodes in a 
^^ network in such a manner that an end-user (whom we term as a data-collector, or a DC) can retrieve 
(^ the data stored by tapping into neighboring nodes. A popular option that reduces network congestion and 
^^ that leads to increased resiliency in the face of node failures, is to employ erasure coding, for example 
> by calling upon maximum-distance-separable (MDS) codes such as Reed-Solomon (RS) codes. 
'k> Let B be the total number of message symbols, over a finite field Fg of size q. With RS codes, data is 

^ stored across n nodes in the network in such a way that the entire data can be recovered by a data-collector 
by connecting to any arbitrary k nodes, a process of data recovery that we will refer to as reconstruction. 
Several distributed storage systems such as RAID-6, OceanStore [1] and Total Recall [j2| employ such an 
erasure-coding option. 

Upon failure of an individual node, a self-sustaining data storage network must necessarily possess the 
ability to repair the failed node. An obvious means to accomplish this, is to permit the replacement node 
to connect to any k nodes, download the entire data, and extract the data that was stored in the failed 
node. For example, RS codes treat the data stored in each node as a single symbol belonging to the finite 
field ¥q. When this is coupled with the restriction that individual nodes perform linear operations over 
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Fig. 1: The regenerating codes setup: (a) data reconstruction, and (b) repair of a failed node. 



Fg, it follows that the smallest unit of data that can be downloaded from a node to assist in the repair 
of a failed node (namely, an F^ symbol), equals the amount of information stored in the node itself. 
As a consequence of the MDS property of an RS code, when carrying out repair of a failed node, the 
replacement node must necessarily collect data from at least k other nodes. As a result, it follows that 
the total amount of data download needed to repair a failed node can be no smaller than B, the size of 
the entire file. But clearly, downloading the entire B units of data in order to recover the data stored 
in a single node that stores only a fraction of the entire data file is wasteful, and raises the question as 
to whether there is a better option. Such an option is provided by the concept of a regenerating code 
introduced by Dimakis et al. [[3j. 

Regenerating codes overcome the difficulty encountered when working with an RS code by working 
with codes whose symbol alphabet is a vector over Fg, i.e., an element of F^ for some parameter a > 1. 
Each node stores a vector symbol, or equivalently stores a symbols over Fg. In this setup, it is clear that 
while maintaining linearity over Fg, it is possible for an individual node to transfer a fraction of the data 
stored within the node. 

Apart from this new parameter a, two other parameters (rf, /3) are associated with regenerating codes. 
Thus we have 

{<?, \n, k, d], {l3, a,B)} 

as the parameter set of a regenerating code. Under the definition of regenerating codes introduced in [[3|, 
a failed node is permitted to connect to an arbitrary subset of d nodes out of the remaining {n — 1) 
nodes while downloading /3 < a symbols from each node. The total amount d(3 of data downloaded for 
repair purposes is termed the repair bandwidth. Typically, with a regenerating code, the average repair 



bandwidth dfi is small compared to the size of the file B. Fig. la and Fig. lb illustrate reconstruction 
and node repair respectively, also depicting the relevant parameters. 

The cut-set bound of network coding can be invoked to show that the parameters of a regenerating 



code must necessarily satisfy Q: 

it-i 
B < ^mm{a,{d-i)/3}. (1) 

i=0 

It is desirable to minimize both a as well as (3 since minimizing a results in a minimum storage solution 
while minimizing (3 (for a fixed d) results in a solution that minimizes the repair bandwidth. It turns 
out that there is a tradeoff between a and (3. The two extreme points in this tradeoff are termed the 
minimum storage regenerating (MSR) and minimum bandwidth regenerating (MBR) points respectively. 
The parameters a and (3 for the MSR point on the tradeoff can be obtained by first minimizing a and 
then minimizing /3 to obtain 

B 
Q^MSR — ~r 5 

^""'^ = k{d-k + iy ^^^ 

Reversing the order, leads to the MBR point which thus corresponds to 

2B 



A 



MBR 



k{2d-k + l)' 



^dB 

"^^"^ - k{2d-k + iy ^^^ 

The focus of the present paper is on the MSR point. Note that regenerating codes with (a = omsr) 
and (/3 = /3msr) are necessarily MDS codes over the vector alphabet F^ . This follows since the ability to 
reconstruct the data from any arbitrary k nodes necessarily implies a minimum distance dmm = n — k + 1. 
Since the code size equals {q°') , this meets the Singleton bound causing the code to be an MDS code. 

A. Choice of the Parameter [3 

Let us next rewrite (|2]) in the form 

omsr = /3msr(c? -k + l) 

B = l3MSR{d-k + l){k). (4) 

Thus if one is able to construct an [n, k, d] MSR code with repair bandwidth achieving the cut-set 
bound for a given value of /3, then both Qmsr = {d — k + 1)/3msr and the size B = k aMSR of the file are 
necessarily fixed. It thus makes sense to speak of an achievable triple 

(/3, a = {d-k + l)(3, B = ka). 

However if a triple (/3, a, B) is achievable, then so is the triple (£/3, £a, i.B) simply through a process of 
divide and conquer, i.e., we divide up the message file into d sub-files and apply the code for (/3, a, B) to 
each of the £ sub-files. Hence, codes that are applicable for the case /3 = 1, are of particular importance 
as they permit codes to be constructed for every larger integral value of /3. In addition, a code with small 
(3 will involve manipulating a smaller number of message symbols and hence will in general, be of lesser 
complexity. For these reasons, in the present paper, codes are constructed for the case (3 = 1. Setting 
/3 = 1 at the MSR point yields 

"MSR = d-k + l. (5) 

Note that when a; = 1, we have B = k and meeting the cut-set bound would imply d = k. In this case, 
any [n, A;]-MDS code will achieve the bound. Hence, we will consider a > I throughout. 



B. Additional Terminology 

1) Exact versus Functional Repair: In general, the cut-set bound (as derived in 0) applies to functional- 
repair, that is, it applies to networks which replace a failed node with a replacement node which can carry 
out all the functions of the earlier failed node, but which does not necessarily store the same data. Thus, 
under functional-repair, there is need for the network to inform all nodes in the network of the replacement. 
This requirement is obviated under exact-repair, where a replacement node stores exactly the same data 
as was stored in the failed node. We will use the term exact-repair MSR code to denote a regenerating 
code operating at the minimum storage point, that is capable of exact-repair. 

2) Systematic Codes: A systematic regenerating code can be defined as a regenerating code designed 
in such a way that the B message symbols are explicitly present amongst the ka code symbols stored in 
a select set of k nodes, termed as the systematic nodes. Clearly, in the case of systematic regenerating 
codes, exact-repair of the systematic nodes is mandated. A data-collector connecting to the k systematic 
nodes obtains the B message symbols in an uncoded form, making systematic nodes a preferred choice 
for data recovery. This makes the fast repair of systematic nodes a priority, motivating the interest in 
minimizing the repair bandwidth for the exact-repair of systematic nodes. 



The immediate question that this raises, is as to whether or not the combination of (a) restriction to 
repair of systematic nodes and (b) requirement for exact-repair of the systematic nodes leads to a bound on 
the parameters (a, (3) different from the cut-set bound. It turns out that the same bound on the parameters 



[a 



(3) appearing in ([2]) still applies and this is established in Section III 



C. Exact-repair MSR Codes as Network Codes 

The existence of regenerating codes for the case of functional-repair was proved ( [[3j, Q) after casting 
the reconstruction and repair problems as a multicast network coding problem, and using random network 
codes to achieve the cut-set bound. As shown in our previous work p^, construction of exact-repair 
MSR codes for the repair of systematic nodes is most naturally mapped to a non-multicast problem in 
network coding, for which very few results are available. 




Fig. 2: The MSR code design problem for the exact-repair of just the systematic nodes, as a non-multicast network coding 
problem. Here, [n — 4, fc = 2 d = 3] with /3 = 1 giving {a — 2, B = 4). Unmarked edges have capacity a. Nodes labelled 
DC are data-collector sinks, and those labelled I' are replacement node sinks. 



The non-multicast network for the parameter set [ri = 4, k = 2, d = 3] with /3 = 1 is shown in Fig. |2] 
In general, the network can be viewed as having k source nodes, corresponding to the k systematic nodes, 
generating a symbols each per channel use. The parity nodes correspond to downlink nodes in the graph. 



To capture the fact that a parity node can store only a symbols, it is split (as in Q) into two parts 
connected by a link of capacity a : parity node m is split into m„^ and mout with all incoming edges 
arriving at va\^ and all outgoing edges emanating from rriouf 

The sinks in the network are of two types. The first type correspond to data-collectors which connect 
to an arbitrary collection of k nodes in the network for the purposes of data reconstruction. Hence there 
are (^) sinks of this type. The second type of sinks represent a replacement node that is attempting to 
duplicate a failed systematic node, with the node replacing systematic node i denoted by i' . Sinks of this 
type connect to an arbitrary set of d out of the remaining [n — 1) nodes, and hence they are A;("^^) in 
number. It is the presence of these sinks that gives the problem a non-multicast nature. 

Thus, the present paper provides an instance where explicit code constructions achieve the cut-set bound 
for a non-multicast network, by exploiting the specific structure of the network. 

Relation Between (3 and Scalar/Vector Network Coding: The choice of (3 as unity (as in Fig. [2]) may 
be viewed as an instance of scalar network coding. Upon increase in the value of (3, the capacity of each 
data pipe is increased by a factor of (3, thereby transforming the problem into a vector network coding 
problem. Thus, (3 = 1 implies the absence of symbol extension, which in general, reduces the complexity 
of system implementation and is thus of greater practical interest. 

D. Results of the Present Paper 

The primary results of the present paper are: 

• The construction of a family of MDS codes for d = n — I > 2k — I that enable exact-repair of 
systematic nodes while achieving the cut-set bound on repair bandwidth. We have termed this code 
the MISER □ code. 

• Proof that interference alignment is necessary for every exact-repair MSR code. 

• The proof of non-existence of hnear exact-repair MSR codes for rf < 2A; — 3 in the absence of symbol 
extension (i.e., (3 = 1). This result is clearly of interest in the light of on-going efforts to construct 
exact-repair codes with (3 = 1 meeting the cut-set bound Q-Ililj. l[Ill> idH^ [(IZ)' (HI- 



• The construction, also explicit, of an MSR code for d = k + 1. For most values of the parameters, 
d = k + I falls under the d < 2k — 3 regime, and in light of the non-existence result above, exact- 
repair is not possible. The construction does the next best thing, namely, it carries out repair that is 
approximately exact q 

Note that the only explicit codes of the MDS type to previously have been constructed are for small 
values of parameters, [n = 4, A; = 2, d = 3] and [n = 5, A; = 3, d = A]. Prior work is described in 
greater detail in Section [11} 

The remainder of the paper is organized as follows. A brief overview of the prior literature in this field is 
given in the next section. Section |llj The setting and notation are explained in Section III The appearance 



of interference alignment in the context of distributed storage for construction of regenerating codes is 



detailed in Section IV along with an illustrative example. Section M describes the MISER code. The 



non-existence of linear exact-repair MSR codes for d < 2k — 3 in the absence of symbol extension can be 



found in Section^]] along with the proof establishing the necessity of interference alignment. Section VII 



describes the explicit construction of an MSR code for d = k + 1. The final section. Section \VTn\ draws 
conclusions. 

'short for an MDS, Interference-aligning, Systematic, Exact-Regenerating code, tiiat is miserly in terms of bandwidth expended to repair 
a systematic node. 

^The code consists of an exact-repair part along with an auxiliary part whose repair is not guaranteed to be exact. This is explained in 



greater detail in Section VII 



II. Prior Work 

The concept of regenerating codes, introduced in |J3|, Q, permit storage nodes to store more than 
the minimal B /k units of data in order to reduce the repair bandwidth. Several distributed systems 
are analyzed, and estimates of the mean node availability in such systems are obtained. Using these 
values, the substantial performance gains offered by regenerating codes in terms of bandwidth savings are 
demonstrated. 

The problem of minimizing repair bandwidth for the functional repair of nodes is considered in [|3|, Q 
where it is formulated as a multicast network-coding problem in a network having an infinite number of 
nodes. A cut-set lower bound on the repair bandwidth is derived. Coding schemes achieving this bound 
are presented in Q, [[6| which however, are non-explicit. These schemes require large field size and the 
repair and reconstruction algorithms are also of high complexity. 

Computational complexity is identified as a principal concern in the practical implementation of dis- 
tributed storage codes in (5] and a treatment of the use of random, linear, regenerating codes for achieving 
functional-repair can be found there. 

The authors in [[7| and [[8| independently introduce the notion of exact-repair. The idea of using 
interference alignment in the context of exact-repair codes for distributed storage appears first in [|7|. 
Code constructions of the MDS type are provided, which meet the cut-set lower bound when k = 2. Even 
here, the constructions are not explicit, and have large complexity and field-size requirement. 

The first explicit construction of regenerating codes for the MBR point appears in [8|, for the case 
d = n — 1. These codes carry out uncoded exact-repair and hence have zero repair complexity. The 
required field size is of the order of n"^, and in terms of minimizing bandwidth, the codes achieve the 
cut-set bound. 

A computer search for exact-repair MSR codes for the parameter set [n = 5, k = 3, d = A], (3 = 1, 
is carried out in [^, and for this set of parameters, codes for several values of field size are obtained. 

A slightly different setting, from the exact-repair situation is considered in |11[ , where optimal MDS 
codes are given for the parameters d = k + 1 and n > 2k. Again, the schemes given here are non-explicit, 
and have high complexity and large field- size requirement. 

We next describe the setting and notation to be used in the current paper. 

III. Setting and Notation 

The distributed storage system considered in this paper consists of n storage nodes, each having the 
capacity to store a symbols. Let u be the message vector of length B comprising of the B message 
symbols. Each message symbol can independently take values from F^, a finite field of size q. 

In this paper, we consider only linear storage codes. As in traditional coding theory, by a linear storage 
code, we mean that every stored symbol is a linear combination of the message symbols, and only linear 
operations are permitted on the stored symbols. Thus all symbols considered belong to F^. 

For m = 1, . . . , ra, let the (B x a) matrix G^™^ denote the generator matrix of node m. Node m stores 
the following a symbols 

u^G^'"). (6) 

In the terminology of network coding, each column of the nodal generator matrix G*^*"^ corresponds to 
the global kernel (linear combination vector) associated to a symbol stored in the node. The (B x na) 
generator matrix for the entire distributed- storage code, is given by 

G = [GW G(2) ... G(")]. (7) 

Note that under exact-repair, the generator matrix of the code remains unchanged. 

We will interchangeably speak of a node as either storing a symbols, by which we will mean the 
symbols u^G*^™^ or else as storing a vectors, by which we will mean the corresponding set of a global 
kernels that form the columns of nodal generator matrix G^^\ 



We partition the B{= A;«)-length vector u into k components, u^ for i = 1, . . . ,k, each comprising of 
a distinct message symbols: 



u 



u. 



u 



L^fcJ 



(8) 



We also partition the nodal generator matrices analogously into k sub-matrices as 

'Q(m)- 
Q{m) 



G 



[m) 



(9) 



where each G^ is an (a x a) matrix. We will refer to Gf^ as the z* component of 0*^"^^ Thus, node 
m stores the a symbols 



^tQ(m)_^^t^ 



tr'lm) 



(10) 



i=l 



Out of the n nodes, the first k nodes (i.e., nodes 1, . . . ,k) are systematic. Thus, for systematic node 



G 



.a if « = 

0„ ifi^ 



VzG{l,...,fc}, 



(11) 



thus 



where Oq, and /^ denote the (a x a) zero matrix and identity matrix respectively; systematic node 
stores the a message symbols that u^ is comprised of. 

Upon failure of a node, the replacement node connects to an arbitrary set of d remaining nodes, termed 
as helper nodes, downloading /3 symbols from each. Thus, each helper node passes a collection of (3 



linear combinations of the symbols stored within the node. As described in Section I-A, an MSR code 
with (3 = 1 can be used to construct an MSR code for every higher integral value of (3. Thus it suffices 
to provide constructions for (3 = 1 and that is what we do here. When (3 = 1, each helper node passes 
just a single symbol. Again, we will often describe the symbol passed by a helper node in terms of its 
associated global kernel, and hence will often speak of a helper node passing a vector |^ 

Throughout the paper, we use superscripts to refer to node indices, and subscripts to index the elements 
of a matrix. The letters m and E are reserved for node indices; in particular, the letter £ is used to index 
systematic nodes. All vectors are assumed to be column vectors. The vector e^ represents the standard 
basis vector of length a, i.e., e^ is an a-length unit vector with 1 in the ith position and Os elsewhere. 
For a positive integer p, we denote the {p x p) zero matrix and the {p x p) identity matrix by Op and Ip 
respectively. We say that a set of vectors is aligned if the vector-space spanned by them has dimension 
at most one. 

We next turn our attention to the question as to whether or not the combination of (a) restriction to 
systematic-node repair and (b) requirement of exact-repair of the systematic nodes leads to a bound on 
the parameters (a, (3) different from the cut-set bound appearing in ([T]). 

The theorem below shows that the cut-set bound comes into play even if functional repair of a single 
node is required. 

Theorem 1: Any [n, k, d]-MDS regenerating code (i.e., a regenerating code satisfying B = ka) that 
guarantees the functional-repair of even a single node, must satisfy the cut-set lower bound of ([T]) on 

'a simple extension to the case of /3 > 1 lets us treat the global kernels of the /3 symbols passed by a helper node as a subspace of 
dimension at most /?. This 'subspace' viewpoint has been found useful in proving certain general results at the MBR point in L8J, and for 
the interior points of the tradeoff in 113). 



repair bandwidth, i.e., must satisfy 

Proof: First, consider the case when (3 = 1. Let i denote the node that needs to be repaired, and let 
{rrii \ i = 1, . . . , d} denote the d helper nodes assisting in the repair of node i. Further, let {7(™»' ^') \ i = 
1, . . . ,d} denote the vectors passed by these helper nodes. At the end of the repair process, let the (B x a) 
matrix G*^^-* denote the generator matrix of the replacement node (since we consider only functional-repair 
in this theorem, G*^^^ need not be identical to the generator matrix of the failed node). 

Looking back at the repair process, the replacement node obtains G*^^^ by operating linearly on the 
collection of d vectors {'j^'^^' ^^ \ i = 1, . . . ,d} of length B. This, in turn, implies that the dimension of 
the nuUspace of the matrix 

tqW ji-^i'^) ... j(^d,m (13) 

should be greater than or equal to the dimension of G*^'\ which is a. However, the MDS property requires 
that at the end of the repair process, the global kernels associated to any k nodes be linearly independent, 
and in particular, that the matrix 

have full-rank. It follows that we must have 

d > k-l + a. 

The proof for the case /3 > 1, when every helper node passes a set of /3 vectors, is a straightforward 
extension that leads to: 

dp > {k~l)P + a. (15) 

Rearranging the terms in the equation above, and substituting a = ^ leads to the desired result. ■ 

Thus, we recover equation (|2]), and in an optimal code with /3 = 1, we will continue to have 

d = k — 1 + a. 

In this way, we have shown that even in the setting that we address here, namely that of the exact-repair 
of the systematic nodes leads us to the same cut-set bound on repair bandwidth as in ([!]). The next section 
explains how the concept of interference alignment arises in the distributed- storage context. 

IV. Interference Alignment in Regenerating Codes 



The idea of interference alignment has recently been proposed in [19|, [20| in the context of wireless 
communication. The idea here is to design the signals of multiple users in such a way that at every receiver, 
signals from all the unintended users occupy a subspace of the given space, leaving the remainder of the 
space free for the signal of the intended user. 

In the distributed- storage context, the concept of 'interference' comes into play during the exact-repair 
of a failed node in an MSR code. We present the example of a systematic MSR code with [n = 4, k = 
2, d = 3] and /3 = 1, which gives {a = d — k + l = 2, B = ka = A). Let {mi, M2, %, u^} denote the 
four message symbols. Since k = 2 here, we may assume that nodes 1 and 2 are systematic and that node 
1 stores {mi, U2] and node 2 stores {^3, u^}. Nodes 3 and 4 are then the parity nodes, each storing two 
hnear functions of the message symbols. 

Consider repair of systematic node 1 wherein the rf = 3 nodes, nodes 2, 3 and 4, serve as helper nodes. 
The second systematic node, node 2, can only pass a linear combination of message symbols U3 and 
U4. The two symbols passed by the parity nodes are in general, functions of all four message symbols: 
(aiUi + a2U2 + asUs + a^u^) and {piUi + 62^2 + ^3^3 + &4M4) respectively. 
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Fig. 3: Illustration of interference alignment during exact-repair of systematic node 1. 



Using the symbols passed by the three helper nodes, the replacement of node 1 needs to be able to 
recover message symbols {ui,U2}. For obvious reasons, we will term {aiUi + 02^2) and {biUi + &2M2) 
as the desired components of the messages passed by parity nodes 3 and 4 and the terms (a^us + a^u^) 
and (63U3 + 64U4) as interference components. 

Since node 2 cannot provide any information pertaining to the desired symbols {ui, U2], the replacement 
node must be able to recover the desired symbols from the desired components {aiUi + 02^2) and {hiUi + 
h2U2) of the messages passed to it by the parity nodes 3 and 4. To access the desired components, 
the replacement node must be in a position to subtract out the interference components (03^3 + 04^4) 
and (&3'U3 + h^U/^) from the received linear combinations {aiUi + 02^2 + 03^3 + a4'U4) and {hiUi + 
62M2 + &3M3 + &4M4); the only way to subtract out the interference component is by making use of the 
linear combination of {u^.Ui] passed by node 2. It follows that this can only happen if the interference 
components (03^3 + 04^4) and (&3M3 + 64M4) are aligned, meaning that they are scalar multiples of each 
other. 

An explicit code over F5 for the parameters chosen in the example is shown in Fig. [3} The exact-repair 
of systematic node 1 is shown, for which the remaining nodes pass the first of the two symbols stored in 
them. Observe that under this code, the interference component in the two symbols passed by the parity 
nodes are aligned in the direction of M3, i.e., are scalar multiples of M3. Hence node 2 can simply pass 
M3 and the replacement node can then make use of M3 to cancel (i.e., subtract out) the interference. 

In the context of regenerating codes, interference alignment was first used by Wu et al. [7| to provide 
a scheme (although, not explicit) for the exact-repair at the MSR point. However, interference alignment 
is employed only to a limited extent as only a portion of the interference components is aligned and as 
a result, the scheme is optimal only for the case k = 2. 

In the next section, we describe the construction of the MISER code which aligns interference and 
achieves the cut- set bound on the repair bandwidth for repair of systematic nodes. This is the first 
interference-alignment-based explicit code construction that meets the cut-set bound. 



V. Construction of the MISER Code 

In this section we provide an explicit construction for a systematic, MDS code that achieves the lower 
bound on repair bandwidth for the exact-repair of systematic nodes and which we term as the MISER code. 
We begin with an illustrative example that explains the key ideas behind the construction. The general 
code construction for parameter sets of the form n = 2k, d = n — 1 closely follows the construction in 
the example. A simple, code-shortening technique is then employed to extend this code construction to 
the more general parameter set n > 2k, d = n — 1. 

The construction technique can also be extended to the even more general case of arbitrary n, d >2k—l, 
under the added requirement however, that the replacement node connect to all of the remaining systematic 
nodes. 



10 

A. An Example 

The example deals with the parameter set, [n = 6, A; = 3, (i = 5], /3 = 1, so that {a = d — k + l = 
3, B = ka = Q). We select Fy as the underlying finite field so that all message and code symbols are 
drawn from F7. Note that we have a = A; = 3 here. This is true in general: whenever n = 2k and 
d = n — I, we have a = d — k + l = k which simplifies the task of code construction. 



1) Design of Nodal Generator Matrices: As k 
in uncoded form. Hence 

'h' 

G(i) 



3, the first three nodes are systematic and store data 



O3 
.O3J 



G(2) 



03' 

h 

LO3. 



G(3) 



■03 
03 

.^3J 



(16) 



A key ingredient of the code construction presented here is the use of a Cauchy matrix |21 1. Let 



\l/q 



■,(4) „/,(5) 



i^r rr ri 



(6) 



v4" 






4" 



* 



(«) 



(17) 



be a (3 X 3) matrix such that each of its sub-matrices is full rank. Cauchy matrices have this property 
and in our construction, we will assume \E'3 to be a Cauchy matrix. 



We choose the generator matrix of parity node m 



m 



4,5,6) to be 



qM 



" 2^i'") 








2^f) 


,(m) 





2^f) 





; I'm) 
^1 


z^f) 


2^i") 








2^^^) 








24-) 


,(m) 
^2 


^r 





2^i™) 





1 (m) 


2V^f) 








2^r^ 



(18) 



where the location of the non-zero entries of the ith sub-matrix are restricted to lie either along the 
diagonal or else within the ith column. The generator matrix is designed keeping in mind the need for 
interference alignment and this will be made clear in the discussion below concerning the exact-repair of 
systematic nodes. The choice of scalar '2' plays an important role in the data reconstruction property; the 
precise role of this scalar will become clear when this property is discussed. An example of the [6, 3, 5] 
MISER code over F7 is provided in Fig. |4| where the Cauchy matrix ^ is chosen as 



^ 



5 
2 
3 



4 
5 
2 



1 

4 
5 



(19) 



Also depicted in the figure is the exact-repair of node 1, for which each of the remaining nodes pass 
the first symbol that they store. It can be seen that the first symbols stored in the three parity nodes 4, 
5 and 6 have their interference components (components 2 and 3) aligned and their desired components 
(component 1) linearly independent. 
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Fig. 4: An example of the [6, 3, 5] MISER code over F7. Here, {ui, . . . , ug} denote the message symbols and the code 
symbols stored in each of the nodes are shown. Exact-repair of node 1 is also depicted. 



The key properties of the MISER code will be established in the next section, namely: 

• that the code is an MDS code over alphabet F° and this property enables data reconstruction and 

• that the code has the ability to carry out exact-repair of the systematic nodes while achieving the 
cut- set bound on repair bandwidth. 

We begin by establishing the exact-repair property. 

2) Exact-repair of Systematic Nodes: Our algorithm for systematic node repair is simple. As noted 
above, each node stores a = k symbols. These k symbols are assumed to be ordered so that we may 
speak of the first symbol stored by a node, etc. To repair systematic node i, I < i < k, each of the 
remaining nodes passes their respective ith symbol. 

Suppose that in our example construction here, node 1 fails. Each of the parity nodes then pass on their 
first symbol, or equivalently, in terms of global kernels, the first column of their generator matrices for 
the repair of node 1. Thus, from nodes 4, 5, and 6, the replacement node obtains 



r M'' 1 

24'' 

24^' 




r 2^f ) 1 
24^' 

24'' 




r 24^' 1 

2V.f 

24^' 


4^ 



) 


4" 
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4^' 
















;(4) 




4'' 



V-^3 




L J 




L J 




L J 



(20) 



Note that in each of these vectors, the desired (first) components are a scaled version of the respective 
columns of the Cauchy matrix ^3. The interference (second and third) components are aligned along the 
vector [1 0]*. Thus, each interference component is aligned along a single dimension. Systematic nodes 
2 and 3 then pass a single vector each that is designed to cancel out this interference. Specifically, nodes 
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2 and 3 respectively pass the vectors 



"o" 




"o" 


















1 










) 


















1 



















(21) 



The net result is that after interference cancellation has taken place, replacement node 1 is left with 
access to the columns of the matrix 

" 2^3 



_0^ 
O3 

Thus the desired component is a scaled Cauchy matrix \E'3. By multiplying this matrix on the right by 
^\E'J^, one recovers 



r /3 1 


03 


L 03 J 



as desired. 

Along similar lines, when nodes 2 or 3 fail, the parity nodes pass the second or third columns of 
their generator matrices respectively. The design of generator matrices for the parity nodes is such that 
interference alignment holds during the repair of either systematic node, hence enabling the exact-repair 
of all the systematic nodes. 

3) Data Reconstruction (MDS property): For the reconstruction property to be satisfied, a data-collector 
downloading symbols stored in any three nodes should be able to recover all the nine message symbols. 
That is, the (9 x 9) matrix formed by columnwise concatenation of any three nodal generator matrices, 
should be non-singular. We consider the different possible sets of three nodes that the data-collector can 
connect to, and provide appropriate decoding algorithms to handle each case. 

(a) Three systematic nodes: When a data-collector connects to all three systematic nodes, it obtains all 
the message symbols in uncoded form and hence reconstruction is trivially satisfied. 

(b) Two systematic nodes and one parity node: Suppose the data-collector connects to systematic nodes 
2 and 3, and parity node 4. It obtains all the symbols stored in nodes 2 and 3 in uncoded form and 
proceeds to subtract their effect from the symbols in node 4. It is thus left to decode the message symbols 
u^, that are encoded using matrix G[ given by 



G'; 



(4) 



2^pr> 








2V^f 


4'^ 





24^^ 





v^r^ 



(22) 



This lower-triangular matrix is non-singular since by definition, all the entries in a Cauchy matrix are 
non-zero. The message symbols u^ can hence be recovered by inverting G[ . 

(c) All three parity nodes: We consider next the case when a data-collector connects to all three parity 
nodes. Let Ci be the (9 x 9) matrix formed by the columnwise concatenation of the generator matrices 
of these three nodes. 



Claim 1: The data-collector can recover all the message symbols encoded using the matrix Ci, formed 
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by the columnwise concatenation of the generator matrices of the three parity nodes: 

Ci = [G(^) G(5) G(6)] . 

Proof: We permute the columns of Ci to obtain a second matrix C2 in which the i 
columns of all the three nodes are adjacent to each other as shown below: 



(23) 
1,2,3) 
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7//''' 


2^^'^ 
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2^f 



group 1 



group 2 



group 3 



Note that a permutation of the columns does not alter the information available to the data-collector 
and hence is a permissible operation. This rearrangement of coded symbols, while not essential, simplifies 



the proof. We then post-multiply by a block-diagonal matrix ^3 to obtain the matrix C3 given by 



C. 



Co 



O3 
O3 



O3 
O3 



O3 
O3 



(24) 



r 2 


000 


0- 


020 
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000 


002 


000 
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1 


200 


000 


000 
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000 


000 


002 


1 


1 


000 


200 


000 


1 


020 


.000 


000 


002. 



(25) 



To put things back in perspective, the data collector at this point, has access to the coded symbols 

u'Cs 

associated with the three parity nodes. From the nature of the matrix it is evident that message symbols 
Ml, M5 and uq are now available to the data-collector, and their effect can be subtracted from the remaining 
symbols to obtain the matrix 



[U2 M3 U4 Me U7 Us] 



r 2 





1 








1 





2 








1 





1 





2 




















2 





1 





1 








2 





. 








1 





2 . 



C4 



(26) 
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As 2^ 7^ 1 in F7, the matrix C4 above can be verified to be non-singular and thus the remaining message 
symbols can also be recovered by inverting C4. ■ 

(d) One systematic node and two parity nodes: Suppose the data-collector connects to systematic node 
1 and parity nodes 4 and 5. All symbols of node 1, i.e., u^ are available to the data-collector. Thus, it 
needs to decode the message-vector components U2 ^^^ M3 which are encoded using a matrix Bi given 
by 



Bi 






Gf 



G':' G 



(5) 



(27) 



Claim 2: The block-matrix Bi above is non-singular and in this way, the message-vector components 



U2 and M3 can be recovered. 



Proof: Once again, we begin by permuting the columns of Bi. For i 
group the i*'^ columns of the two parity nodes together to give the matrix 



2, 3, 1 (in this order), we 
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(28) 



Let "^2 be the (2 x 2) sub-matrix of the Cauchy matrix ^3, given by 



^, 



4'^ ^^ 

4'^ ^: 



(5) 

2 

(5) 



(29) 



Since every sub-matrix of ^3 is non-singular, so is ^2- Keeping in mind the fact that the data collector 
can perform any linear operation on the columns of B2, we next multiply the last two columns of B2 by 
^2 ^ (while leaving the other 4 columns unchanged) to obtain the matrix 
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(30) 



The message symbols associated to the last last two columns of B2 are now available to the data-collector 
and their effect on the rest of the encoded symbols can be subtracted out to get 



B, 



24'^ 2^f 
2^f 2^f 



W 




w 







4'' 





4^' 



24'^ 2^pf^ 
24f 24f 



(31) 



Along the lines of the previous case, the matrix ,84 above can be shown to be non-singular. We note that 
this condition is equivalent to the reconstruction in a MISER code with k = 2 and a data-collector that 
attempts to recover the data by connecting to the two parity nodes. ■ 



15 

B. The General MISER Code for n = 2k, d = n — 1 

In this section, the construction of MISER code for the general parameter set n = 2k, d = n — 1 is 
provided. Since the MISER code is built to satisfy the cut-set bound, we have that d = a + k — 1 which 
implies that 

k = a . (32) 

This relation will play a key role in the design of generator matrices for the parity nodes as this will 
permit each parity node to reserve a = k symbols associated to linearly independent global kernels for 
the repair of the k systematic nodes. In the example just examined, we had a = k = 3. The construction 
of the MISER code for the general parameter set n = 2k, d = n — 1 is very much along the lines of the 
construction of the example code. 

1) Design of Nodal Generator Matrices: 

The first k nodes are systematic and store the message symbols in uncoded form. Thus the component 
generator matrices G] , 1 < i < k of the ith systematic node, 1 < £ < k, are given by 



la if i = 

0„ ifiy^ 



(33) 



Let \1' be an (ax {n — k)) matrix with entries drawn from Fg such that every sub-matrix of ^ is of 
full rank. Since n — k = a = k, wc have that \1' is a square matrix Q Let the columns of \1' be given by 



m 



(34) 



where the mth column is given by 



ij 



(m) 



(m)- 



Vi 



^i'"^ 



(35) 



A Cauchy matrix is an example of such a matrix, and in our construction, we will assume \1/ to be a 
Cauchy matrix. 

Definition 1 (Cauchy matrix): An (s x t) Cauchy matrix ^ over a finite field Fg is a matrix whose 
(z,j)th element (1 < i < s, 1 < j < t) equals ,^ . where {xj} U {yj} is an injective sequence, i.e., a 
sequence with no repeated elements. 

Thus the minimum field size required for the construction of a (s x t) Cauchy matrix is s + t. Hence 
if we choose ^ to be a Cauchy matrix, 

q > a + n — k. (36) 

Any finite field satisfying this condition will suffice for our construction. Note that since n — k > a > 2, 
we have g > 4. 



We introduce some additional notation at this point. Denote the jth column of the {a x a) matrix G 



(m) 



as gf(™\ I.e., 



G 



(m) 



— «,i 



9) 



(m) 



(37) 



The code is designed assuming a regeneration algorithm under which each of the a parity nodes passes 
its £* column for repair of the £"^ systematic node. With this in mind, fox k + 1 < m < n, 1 < i, j < a. 



In Section 



V-D 



we extend the construction to the even more general case of arbitrary n, d > 2k — I, under the added requirement 
however, that the replacement node connect to all of the remaining systematic nodes. In that section, we will be dealing with a rectangular 
(a X (n — k)) matrix ^. 
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we choose 



9] 



(m) 



«J 






if i=j 

ifi^i 



(38) 



where e is an element from F^ such that e 7^ and e^ 7^ 1 (in the example provided in the previous 
section, e G F7 was set equal to 2). The latter condition e^ 7^ 1 is needed during the reconstruction process, 
as was seen in the example. Note that there always exists such a value e as long as g > 4. 

As in the example, the generator matrix is also designed keeping in mind the need for interference 
alignment. This property is utilized in the exact-repair of systematic nodes, as described in the next section. 

2) Exact-Repair of Systematic Nodes: The repair process we associate with the MISER code is simple. 
The repair of a failed systematic node, say node ^, involves each of the remaining d = n — I nodes 
passing their £th symbols (or equivalently, associated global kernels) respectively. In the set of a vectors 
passed by the parity nodes, the Ah (desired) component is independent, and the remaining (interference) 
components are aligned. The interference components are cancelled using the vectors passed by the 
remaining systematic nodes. Independence in the desired component then allows for recovery of the 
desired message symbols. 

The next theorem describes the repair algorithm in greater detail. 

Theorem 2: In the MISER code, a failed systematic node can be exactly repaired by downloading one 
symbol from each of the remaining d = n — 1 nodes. 

Proof: Consider repair of the systematic node L Each of the remaining {n — 1) nodes passes its ith 
column, so that the replacement node has access to the global kernels represented by the columns shown 
below: 



^i 
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e^ 




















^l 



V^f^'^e, 




^{"^e, " 


^'X^'e, 






e^P^'^'^ 




6^(") 






;{n) 






1 (n) 



From systematic nodes From parity nodes 

where e^ denotes the £th unit vector of length a and denotes a zero vector of length a. 

Observe that apart from the desired ith component, every other component is aligned along the vector 
e^. The goal is to show that some a linear combinations of the columns above will give us a matrix whose 
ith component equals the (a x a) identity matrix, and has zeros everywhere else. But this is clear from 
the interference alignment structure just noted in conjunction with linear independence of the a vectors 
in the desired component: 

{^('=+1), ■■■ ,^('^)}. (39) 



Next, we discuss the data reconstruction property. 

3) Data Reconstruction (MDS Property): For reconstruction to be satisfied, a data-collector download- 
ing all symbols stored in any arbitrary k nodes should be able to recover the B message symbols. For 
this, we need the {B x B) matrix formed by the columnwise concatenation of any arbitrary collection of 
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k nodal generator matrices to be non-singular. The proof of this property is along the lines of the proof 
in the example. For completeness, a proof is presented in the appendix. 

Theorem 3: A data-collector connecting to any k nodes in the MISER code can recover all the B 
message symbols. 

Proof: Please see the Appendix. ■ 

Remark 1: It is easily verified that both reconstruction and repair properties continue to hold even when 
we choose the generator matrices of the parity nodes g')'^\ k + l<m<n, l<i,j<atohe given by: 

,„, _ I E.^"") if » = J 



-«J 



where Sj = diagjcj^i , . . . , ej^^} is an {a x a) diagonal matrix satisfying 

1) eij i- 0, Vi,j 

2) ejjej.j 7^ 1, V z 7^ j. 

The first condition suffices to ensure exact-repair of systematic nodes. The two conditions together ensure 
that the (MDS) reconstruction property holds as well. 

C. The MISER Code for n>2k, d = n-l 

In this section we show how the MISER code construction for n = 2k, d = n ~ 1 can be extended to 
the more general case n > 2k, d = n — I. From the cut-set bound ([5]), for this parameter regime, we get 

k<a . (41) 

We begin by first showing how an incremental change in parameters is possible. 

Theorem 4: An [n, k, d], linear, systematic, exact-repair MSR code C can be derived from an [n' = 
n + 1, k' = k + 1, d' = d + I] linear, systematic, exact-repair MSR code C. Furthermore if d' = ak' + b 
in code C, d = ak + b+ (a — 1) in code C. 
Proof: We begin by noting that 

n-k = n'-k' (42) 

a' = a = d-k + l (43) 

B' = k'{d! -k' + 1) = B + a. (44) 



In essence, we use code shortening [22 1 to derive code C from code C. Specification of code C requires 
that given a collection of B = ka message symbols, we identify the a code symbols stored in each of 
the n nodes. We assume without loss of generality, that in code C, the nodes are numbered 1 through n, 
with nodes 1 through k representing the systematic nodes. We next create an additional node numbered 
0. 

The encoding algorithm for code C is based on the encoding algorithm for code C. Given a collection 
of B message symbols to be encoded by code C, we augment this collection by an additional a message 
symbols all of which are set equal to zero. The first set of B message symbols will be stored in systematic 
nodes 1 through k and the string of a zeros will be stored in node 0. Nodes through k are then regarded 
as constituting a set of k' = (k + 1) systematic nodes for code C. The remaining (n — k) parity nodes 
are filled using the encoding process associated with code C using the message symbols stored in the k' 
nodes numbered through k. Note that both codes C and C share the same number [n — k) of parity 
nodes. 

To prove the data reconstruction property of C, it suffices to prove that all the B message symbols can 
be recovered by connecting to an arbitrary set of k nodes. Given a data-collector connecting to a particular 
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Fig. 5: Consti-uction of a [?i = 5, fc = 2, d = 4] MISER code from a [n' = 6, k' = 3, d' = 5] MISER code. Shortening 
the code with respect to node zero is equivalent to removing systematic node as well as the top component of every nodal 
generator matrix. The resulting [n ^ 5, k = 2, d = 4] MISER code has {u^, . . . , ug} as its S = fca = 6 message symbols. 



set of k nodes, we examine the corresponding scenario in code C in which the data-collector connects 
to node in addition to these k nodes. By the assumed MDS property of code C, all the B message 
symbols along with the a message symbols stored in node can be decoded using the data stored these 
(k + 1) nodes. However, since the a symbols stored in node are all set equal to zero, they clearly play 
no part in the data-reconstruction process. It follows that the B message symbols can be recovered using 
the data from the k nodes (leaving aside node 0), thereby estabhshing that code C possesses the required 
MDS data-reconstruction property. 

A similar argument can be used to establish the repair property of code C as well. Finally, we have 



d' = ak' + b 
d+1 = a{k + l) + b 
=^ d = ak + b+{a — l). 



By iterating the procedure in the proof of Theorem |4] above i times we obtain: 

Corollary 5: An [n, k, d] linear, systematic, exact-repair MSR code C can be constructed by shortening 
a [n' = n+i, k' = k + i, d' = d+i] linear, systematic, exact-repair MSR code C. Furthermore if d' = ak' + b 
in code C, d = ak + b + i(a — 1) in code C. 



Remark 2: It is shown in the sequel (Section VI-B| ) that every linear, exact-repair MSR code can be 
made systematic. Thus, Theorem |4] and Corollary [5] apply to any linear, exact-repair MSR code (not just 
systematic). In addition, note that the theorem and the associated corollary hold for general values of 
[n, k, d] and are not restricted to the case of d = n — 1. Furthermore, a little thought will show that they 
apply to linear codes C that perform functional repair as well. 

The next corollary follows from Corollary |5} and the code- shortening method employed in the Theo- 
rem |4l 

Corollary 6: The MISER code for n > 2k, d = n — 1 can be obtained by shortening the MISER code 

for n' = n + {n-2k), k' = k + {n-2k), d' = d+ {n - 2k) = n' - 1 . 



Example: The code- shortening procedure represented by Theorem |4] is illustrated by the example shown 
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in Fig. [5] Here it is shown how a MISER code having code parameters [n' = 6, k' = 3, d' = 5], (3' = I and 
{a' = d' — k'+l = 3, B' = a'k' = 9) yields upon shortening with respect to the message symbols in node 0, 
a MISER code having code parameters [n = 5, k = 2, d = A], (3 = 1 and (a = d—k+l = 3, B = ak = 6). 

D. Extension to2k— l<d<n — 1 When The Set of Helper Nodes Includes All Remaining Systematic 
Nodes 

In this section, we present a simple extension of the MISER code to the case when 2k — \<d<n — l, 
under the additional constraint however, that the set of d helper nodes assisting a failed systematic node 
includes the remaining k — 1 systematic nodes. The theorem below, shows that the code provided in 



Section V-B for n = 2k, d = n — 1 supports the case d = 2k — 1, d < n — 1 as well as long as this 
additional requirement is met. From here on, extension to the general case d > 2k — 1, d < n — 1 is 
straightforward via the code-shortening result in Theorem |4j Note that unlike in the previous instance, 
the (ax {n — k)) Cauchy matrix used in the construction for d < n — 1 is a rectangular matrix. 

Theorem 7: For d = 2k — I, d < n — 1, the code defined by the nodal generator matrices in 



equations (33) and (38), achieves reconstruction and optimal, exact-repair of systematic nodes, provided 
the replacement node connects to all the remaining systematic nodes. 

Proof: Reconstruction: The reconstruction property follows directly from the reconstruction property 
in the case of the original code. 

Exact-repair of systematic nodes: The replacement node connects to the {k — 1) remaining systematic 
nodes and an arbitrary a parity nodes (since, meeting the cut- set bound requires d = k — l + a). Consider 
a distributed storage system having only these (k — l + a) nodes along with the failed node as its n nodes. 



Such a system has d = n — 1, d = 2k — l and is identical to the system described in Section V-B Hence 
exact-repair of systematic nodes meeting the cut-set bound is guaranteed. ■ 

E. Analysis of the MISER Code 

a) Field Size Required: The constraint on the field size comes due to construction of the (ax [n — k)) 
matrix \E' having all sub-matrices of full rank. For our constructions, since ^ is chosen to be a Cauchy 
matrix, any field of size {n + d — 2k + \) or higher suffices. For specific parameters, the matrix \1' can be 
handcrafted to yield smaller field sizes. 

b) Complexity of Exact-Repair of Systematic Nodes: Each node participating in the exact-repair of 
systematic node i, simply passes its ith symbol, without any processing. The replacement node has to 
multiply the inverse of an (a x a) Cauchy matrix with an a length vector and then perform [k — 1) 
subtractions for interference cancellation. 

c) Complexity of Reconstruction: The complexity analysis is provided for the case n = 2k, d = n — 1, 
other cases follow on the similar lines. A data-collector connecting to the k systematic nodes can recover 
aU the data without any additional processing. A data-collector connecting to some k arbitrary nodes 
has to (in the worst case) multiply the inverse of a (/c x k) Cauchy matrix with k vectors, along with 
operations having a lower order of complexity. 



F. Relation to Subsequent Work /[74]/ 

Two regenerating codes are equivalent if one code can be transformed into the other via a non-singular 



symbol remapping (this definition is formalized in Section VI-B). The capabilities and properties of 
equivalent codes are thus identical in every way. 

The initial presentation of the MISER code in (W\ (the name 'MISER' was coined only subsequently) 
provided the construction of the code along with two (of three) parts of what may be termed as a complete 
decoding algorithm, namely: (a) reconstruction by a data collector, and (b) exact-repair of failed systematic 
nodes. It was not known whether the third part of decoding, i.e., repair of a failed parity node could be 



carried out by the MISER code. Following the initial presentation of the MISER code, the authors of [ 14 1 
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show how a common eigenvector approach can be used to estabhsh that exact repair of the parity nodes 
is also possible under the MISER code construction q] 

VI. Necessity of Interference Alignment and Non-Existence of Scalar, Linear, 

Exact- REPAIR MSR Codes for d <2k -3 

In Section |v| explicit, exact-repair MSR codes are constructed for the parameter regimes {d > 2k — 
1, d = n — 1) performing reconstruction and exact-repair of systematic nodes. These constructions are 
based on the concept of interference alignment. Furthermore, these codes have a desirable property of 
having the smallest possible value for the parameter /3, i.e., (3 = 1. 

As previously discussed in Section I-C[ the problem of constructing exact-repair MSR codes is (in 



part) a non-multicast network coding problem. In particular, for the case of /3 = 1, it reduces to a scalar 
network coding problem. Upon increase in the value of (3, the capacity of every data pipe is increased 
by a factor of (3, thereby transforming it into a vector network coding problem. Thus, (3 = 1 corresponds 
to the absence of symbol extension, which in general, reduces the complexity of system implementation. 



Furthermore, as noted in Section I-A, an MSR code for every larger integer value of (3, can be obtained 
by concatenating multiple copies of a /3 = 1 code. For this reason, the case of /9 = 1 is of special interest 
and a large section of the literature in the field of regenerating codes ( [|7|-pT|, p3| , pl| , p7| , p8| ) is 
devoted to this case. 

In the present section, we show that for d < 2k — 3, there exist no linear, exact-repair MSR codes 
achieving the cut-set bound on the repair bandwidth in the absence of symbol extension. In fact, we show 
that the cut-set bound cannot be achieved even if exact-repair of only the systematic nodes is desired. We 
first assume the existence of such a linear, exact-repair MSR code C satisfying: 

{(3 = 1, B = ka, a = d-k + l) (45) 

and 

{d<2k-3^a<k-2). (46) 

Subsequently, we derive properties that this code must necessarily satisfy. Many of these properties hold 
for a larger regime of parameters and are therefore of independent interest. In particular, we prove that 
interference alignment, in the form described in Section |IV} is necessary. We will show that when d < 
2k — 3 the system becomes over-constrained, leading to a contradiction. 
We begin with some some additional notation. 

Remark 3: In recent work, subsequent to the original submission of this paper, it is shown in fTSl, 



[ 16 1 that the MSR point under exact-repair can be achieved asymptotically for all [n, k, d] via an infinite 
symbol extension, i.e., in the limit as /3 — )■ oo. This is established by presenting a scheme under which 
lim/3_j.oo d« = 1- Note that in the asymptotic setup, since both a, 5 are multiples of (3, these two parameters 
tend to infinity as well. 

A. Additional Notation 

We introduce some additional notation for the vectors passed by the helper nodes to the replacement 
node. For i, m G {1, . . . , n} , £ ^ m, let 'y^™-'^\ denote the vector passed by node m for repair of node i. 
In keeping with our component notation, we will use 7^™'^) to denote the ith component, 1 < i < A;, of 
this vector. 

Recall that a set of vectors are aligned when the vector- space spanned by them has a dimension no 
more than one. Given a matrix A, we denote its column-space by colspace[yl] and its (right) null space 
by nullspace[yl]. Clearly, 7^™'^) g colspace [G*^*"^]. 

^In jl4| a class of regenerating codes is presented tiiat iiave tlie same parameters as does the MISER code. Tiiis class of codes can 



however, be shown to be equivalent to the MISER code (and hence to each other) under the equivalence notion presented in Section VI-B 
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B. Equivalent Codes 

Two codes C and C are equivalent if C can be represented in terms of C by 

i) a change of basis of the vector space generated by the message symbols (i.e., a remapping of the 

message symbols), and 
ii) a change of basis of the column-spaces of the nodal generator matrices (i.e., a remapping of the 
symbols stored within a node). 
A more rigorous definition is as follows. 

Definition 2 (Equivalent Codes): Two codes C and C are equivalent if 

Q'{m) ^ py Q{m) jj{m) ^^j-^ 

y(m/) ^ p^^(™.^) (48) 

W i,m E {1, . . . ,n}, i ^ m, for some (B x B) non-singular matrix W, and some (a x a) non-singular 
matrix f/('"). 

Since the only operator required to transform a code to its equivalent is a symbol remapping, the 
capabilities and properties of equivalent codes are identical in every respect. Hence, in the sequel, we will 
not distinguish between two equivalent codes and the notion of code equivalence will play an important 
role in the present section. Here, properties of a code that is equivalent to a given code are first derived and 
the equivalence then guarantees that these properties hold for the given code as well. The next theorem uses 
the notion of equivalent codes to show that every linear exact-repair MSR code can be made systematic. 

Theorem 8: Every linear, exact-repair MSR code can be made systematic via a non-singular linear 
transformation of the rows of the generator matrix, which simply corresponds to a re-mapping of the 
message symbols. Furthermore, the choice of the k nodes that are to be made systematic can be arbitrary. 
Proof: 

Let the generator matrix of the given linear, exact-repair MSR code C be G. We will derive an equivalent 
code C that has its first k nodes in systematic form. The reconstruction (MDS property) of code C implies 
that the {B x B) sub-matrix of G, 

pm G(2) . . . G^'^)] 

is non-singular. Define an equivalent code C having its generator matrix G' as: 

G'= [G« G(2) ... G^'^)]"' G. (49) 

Clearly, the B left-most columns of G' form a B x B identity matrix, thus making the equivalent code 
C systematic. As the repair is exact, the code will retain the systematic form following any number of 
failures and repairs. 



The transformation in equation ( |49l ) can involve any arbitrary set of k nodes in C, thus proving the 
second part of the theorem. ■ 

The theorem above permits us to restrict our attention to the class of systematic codes, and assume the 
first k nodes (i.e., nodes 1, . . . , A;) to be systematic. Recall that, for systematic node £ (e {1, . . . , k}). 

Thus, systematic node £ stores the a symbols in y^. 
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C. Approach 

An exact-repair MSR code should be capable of performing exact-repair of any failed node by connect- 
ing to any arbitrary subset of d of the remaining (^ — 1) nodes, while meeting the cut-set bound on repair 
bandwidth. This requires a number of repair scenarios to be satisfied. Our proof of non-existence considers 
a less restrictive setting, in which exact-repair of only the systematic nodes is to be satisfied. Further, we 
consider only the situation where a failed systematic node is to be repaired by downloading data from 
a specific set of d nodes, comprised of the (/c — 1) remaining systematic nodes, and some collection of 
a parity nodes. Thus, for the remainder of this section, we will restrict our attention to a subset of the 
n nodes in the distributed storage network, of size [k + a) nodes, namely, the set of k systematic nodes 
and the first a parity nodes. Without loss of generality, within this subset, we will assume that nodes 1 
through k are the systematic nodes and that nodes [k + 1) through [k + a) are the a parity nodes. Then 
with this notation, upon failure of systematic node d, 1 < d- < k, the replacement node is assumed to 
connect to nodes {1, . . . , A; + a]\{(!.}. 

The generator matrix G of the entire code can be written in a block-matrix form as shown in Fig. |6] 
In the figure, each (block) column represents a node and each (block) row, a component. The first k and 
the remaining a columns contain respectively, the generator matrices of the k systematic nodes and the 
a parity nodes. 
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Fig. 6: The generator matrix G of the entire code. First k (block) columns are associated with the systematic nodes 1 to /c 
and the next a (block) columns to the parity nodes [k + 1) to [k + a). Empty blocks denote zero matrices. 



We now outline the steps involved in proving the non-existence result. Along the way, we will uncover 
some interesting and insightful properties possessed by linear, exact-repair MSR codes. 

1) We begin by establishing that in order to satisfy the data reconstruction property, each sub-matrix in 
the parity-node section of the generator matrix (see Fig. |6]) must be non-singular. 

2) Next, we show that the vectors passed by the a parity nodes for the repair of any systematic node 
must necessarily satisfy two properties: 

• alignment of the interference components, and 

• linear independence of the desired component. 

3) We then prove that in the collection of k vectors passed by a parity node for the respective repair 
of the k systematic nodes, every a-sized subset must be linearly independent. This is a key step that 
links the vectors stored in a node to those passed by it, and enables us to replace the a columns of 
the generator matrix of a parity node with the vectors it passes to aid in the repair of some subset of 
a systematic nodes. We will assume that these a systematic nodes are in fact, nodes 1 through a. 

4) Finally, we will show that the necessity of satisfying multiple interference- alignment conditions 
simultaneously, turns out to be over-constraining, forcing alignment in the desired components as 
well. This leads to a contradiction, thereby proving the non-existence result. 
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D. Deduced Properties 

Property 1 (Non-singularity of the Component Submatrices): Each of the component submatrices {G^ 
k + l<m<k + a, 1 < i < k} is non-singular. 

Proof: Consider a data-collector connecting to systematic nodes 2 to k and parity node (/c + 1). The 
data-collector has thus access to the block matrix shown in Fig. |7} 
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Fig. 7: The block matrix accessed by a data-collector connecting to systematic nodes 2 through k and parity node [k + 1). 

For the data-collector to recover all the data, this block matrix must be non-singular, forcing G\ 
to be non-singular. A similar argument shows that the same must hold in the case of each of the other 
component submatrices. ■ 



Corollary 9: Let H = [Hi H\, ■ ■ ■ , if^]* be a {ka x t) matrix each of whose £ > 1 columns is a linear 
combination of the columns of G*^™^ for some mG{/i; + l,...,fc + a}, and having k components [Hi] 
of size {ax t). Thus 

colspace[if] C colspace[G^™^]. 



Then for every i e {1, . . . , A;}, we have 

nullspace[/fj] = nullspace[if]. 

Proof: Clearly, 

nullspace[i/] C nullspace[ifj]. 

Let H = G^^M, for some (a x i) matrix A. Then 

H, = G^r^A. 

For a vector v E nullspace[ifi], 

HiV = G^l^'^A v = 0. 

However, since G\ is of full rank (Property 111) it follows that 

Av = 
=> G^^Mt; = Hv = 
=^ nullspace[/fj] C nullspace[i/]. 



(51) 
(52) 

(53) 
(54) 



(55) 
(56) 
(57) 



The corollary says, in essence, that any linear dependence relation that holds amongst the columns of 
any of the components H^, also extends to the columns of the entire matrix H itself. 
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We next establish properties that are mandated by the repair capabilities of exact regenerating codes. 
Consider the situation where a failed systematic node, say node i, 1 < i < k, is repaired using one 
vector (as (3 = 1) from each of the remaining k — 1 + a nodes. 

Definition 3: When considering repair of systematic node i, 1 < i < k, the ith component {7^™'^^} 
of each of the a vectors j^^"*'^) \k + l<m<k + a} passed by the a parity nodes will be termed 
as the desired component. The remaining components {^('"•^) | i ^ £} vvill be termed as interference 
components. 

The next property highlights the necessity of interference alignment in any exact-repair MSR code. 
Clearly, the vectors passed by the remaining [k — 1) systematic nodes have P^ component equal to 0, 
and thus the onus of recovering the 'desired' t^ component of replacement node d. falls on the a parity 
nodes. However, the vectors passed by the parity nodes have non-zero 'interference' components that 
can be nulled out only by the vectors passed by the systematic nodes. This forces an alignment in these 
interference components, and this is shown more formally below. 

Property 2 (Necessity of Interference Alignment): In the vectors {7(™'^) \k + l<m<k + a] passed 
by the a parity nodes for the repair of any systematic node (say, node t), the set of a interference 
components {7^"^'^^}, l<i<k, i^i must necessarily be aligned, and the desired components {7!™'^^} 
must necessarily be linearly independent. 

Proof: We assume without loss of generality that i = 1, i.e., we consider repair of systematic node 
1. The matrix depicted in Fig. [8] consists of the a vectors needed to be recovered in the replacement node 
i, alongside the d vectors passed by the d helper nodes 2, . . . ,k + a. This matrix may be decomposed 
into three sub-matrices, namely: a (B x a) matrix Fi, comprising of the a columns to be recovered at the 
replacement node; a {B x (k — 1)) matrix F2, comprising of the (k — 1) vectors passed by the remaining 
systematic nodes; and a (B x a) matrix F3, comprising of the a vectors passed by the parity nodes. 
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Fig. 8: Matrix depicting the a (global-kernel) vectors to be recovered by replacement node 1 (represented by the matrix Fi), 
alongside the d vectors passed by the helper nodes 2, . . . ,k + a (represented by [r2 | T3]). 



The vectors {7^'^'*'^'"'^^ 



'1\ 



(fc+a,l) 



} appearing in the first row of the matrix constitute the desired 



component; for every i G {2, . . . , k}, the set of vectors {7^'^+^'^^ 



.7, 



(k+a,l) 



}, constitute interference 



components. An exact-repair of node 1 is equivalent to the recovery of Fi from the columns of F2 and 
F3 through a linear transformation, and hence it must be that 



colspace[Fi] C colspace [F2IF3 



(58) 
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Fig. 9: Table indicating the vectors passed by the a parity nodes to repair the first a systematic nodes. 



where ' | ' operator denotes concatenation. Wlien we restrict attention to the first components of the matrices, 
we see that we must have 



colspace[/a] C colspace 
thereby forcing the desired components {7^^^^'^-*, . . 



{fc+1,1) {fc+a,l) 

-LI ■ ■ ■ -LI 



^(A:+a,i)| |-Q i^g linearly independent. 



Further, from ([58]) it follows that 

colspace [Fi 1 12] C colspace [r2|r3] . 



Clearly, rank[ri] = a, and from Fig. M it can be inferred that 

rank[ri|r2] = a + rank[r2] . 

Moreover, as the first component in T^ is of rank a, 

rank[r2|r3] < rank[r2] + a 
= rank[ri|r2]. 



It follows from equation ( [60| ) and ( [631 ), that 

colspace [Fi I r2] = colspace [F2IF3] , 
and this forces the interference components in F3 to be aligned. Thus, for i G {2, . . . , k}. 



colspace 



^(fc+i.i) ... ^(fc+a,i) 



C colspace 



7 



(*,i) 



(59) 



(60) 

(61) 

(62) 
(63) 

(64) 
(65) 



Remark 4: Properties [T] and [2] also hold for all /3 > 1, in which case, each of the a helper parity nodes 
pass a /3-dimensional subspace, and each interference component needs to be confined to a /3-dimensional 
subspace. Furthermore, the two properties also hold for all [n, k, d] exact-repair MSR codes, when (A; — !) 
of the d helper nodes along with the replacement node are viewed as systematic. 
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The next property links the vectors stored in a parity node to the vectors it passes to aid in the repair 
of any set of a systematic nodes. 



Property 3: For d < 2k — 1, the vectors passed by a parity node to repair any arbitrary set of a 
systematic nodes are linearly independent, i.e., for m E {k + 1, . . . ,k + a}, it must be that every subset 
of size a drawn from the set of vectors 



is linearly independent. (Thus the matrix [^("^'i) . . . ^C'"-^)] may be viewed as the generator matrix of a 
[A;,a]-MDS code.) 

Proof: Consider Fig. |9] which depicts the vectors passed by parity nodes {k + 1, . . . , k + a} to 
repair systematic nodes {1, . . . , a}. From Property [2] one can infer that in column i E {1, . . . , a}, the 
i^^ (desired) components of the a vectors are independent, and the j* (interference) components for all 
j E {1, . . . , k}\{i} are aligned. In particular, for all j E {a + 1, . . . , k}, the j* components of each 
column are aligned. Note that as rf < 2A; — 1 we have k > a, which guarantees that the set {a + 1, ... , k} 
is non-empty and hence, the presence of an (a + l)th component. 

We will prove Property |3] by contradiction. Suppose, for example, we were to have 



^ik+1,1) ^ colspace {j^'^'^^^ ■ ■ ■ f'^+i'")] 



(66) 



which is an example situation under which the a vectors passed by parity node (k + 1) for the respective 
repair of the first a systematic nodes would fail to be linearly independent. Restricting our attention to 
component (a + 1), we get 



7 



(fe+^1,1) ^ colspace 



(fc+1,2) ... (fc+l,a) 
-La+1 -La+1 



(67) 



Now, alignment of component (a + 1) along each column forces the same dependence in all other parity 
nodes, i.e.. 



La+1 



7l'rr ^ colspace 



7 



(m,2) 
■a+l 



... J 



{m,a) 
■a+l 



ymE {k + 2,...,k + a}. 



(68) 



Noting that a vector passed by a helper node lies in the column-space of its generator matrix, we now 
invoke Corollary |9] 
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This, along with equations ([67]) and (68), implies 



7("'i) C colspace [x^""' 



2) 



7 



(m,o) 



] \/mE {k + l,...,k + a}. 



(69) 



(70) 



Thus the dependence in the vectors passed by one parity node carries over to every other parity node. 
In particular, we have 



^(m,i) c colspace 



li 



(m,2) 



ll 



{m,a) 



Vm E {k + 1, . . . ,k + a}. 



(71) 



However, from Property |2} we know that the vectors passed to systematic nodes 2 to a have their first 
components aligned, i.e.. 



rank 
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(k+ll) 
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(k+a,. 



<1 V£g {2,...,a}. 



(72) 



27 



Aggregating all instantiations (w.r.t. m) of equation (|7T|), the desired component is confined to: 
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(76) 



where the last inequality follows from equation f72| ). This contradicts the assertion of Property [2] with 
respect to the desired component: 



rank 



7 



(m,l) 



fc+a 



m=k+l 



a. 



(77) 



Remark 5: It turns out that an attempted proof of the analogue of this theorem for the case (3 > 1, 
fails to hold. 

The connection between the vectors passed by a parity node and those stored by it, resulting out of 
Property |3} is presented in the following corollary. 

Corollary 10: If there exists a linear, exact-repair MSR code for d < 2k — 1, then there exists an 
equivalent linear, exact-repair MSR code, where, for each parity node, the a columns of the generator 
matrix are respectively the vectors passed for the repair of the first a systematic nodes. 

Proof: Since a node can pass only a function of what it stores, the vectors passed by a parity node 
m G {k + 1, . . . , k + a}, for repair of the systematic nodes must belong to the column-space of its 
generator matrix, i.e., 

f^^'"'^) ■ ■ ■ 7^"^'"^ C colspace [G^^^] . (78) 

Further, Property |3] asserts that the vectors it passes for repair of the first a systematic nodes are linearly 
independent, i.e.. 



rank f^^"^'^) 
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(79) 



It follows that the generator matrix G^"^-* is a non-singular transformation of the vectors [ 7 
that are passed for the repair of the first a systematic nodes, and the two codes with generator matrices 
given by the two representations are hence equivalent. ■ 



In the equivalent code, each row of Fig. |9| corresponds to the generator matrix G^™^ of the associated 
parity node, i.e., 

Q(m) ^ ym,l) . . . ^(m,a)j \/ m & {k + 1, . . . , k + a} . (80) 

Since the capabilities of a code are identical to an equivalent code, we will restrict our attention to this 
generator matrix for the remainder of this section. The two properties that follow highlight some additional 
structure in this code. 

Property 4 (Code structure - what is stored): For d<2k — l, any component ranging from (a + 1) to 
k across the generator matrices of the parity nodes differ only by the presence of a multiplicative diagonal 
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matrix on the right, i.e., 



ry{k+l) _ TT A(fc+1) r'^k+2) _ TT \{k+2) Mk+a)_rT \{k+a) 

^(fe+l) _ rr \{k+l) Mk+2) _ tt ^{k+2) ^{k+a) _ jj a (fc+a) 

Gf. - Hk J\k ^ ^k - ^k J^k ' ■■■ ^k - ^k J^k 



(81) 



where the matrices of the form A; ' are a x a diagonal matrices (and where, for instance, we can choose 
Ha+i = G^f+Z^ in which case A^^^P = I^). 

Proof: Consider the first column in Fig. |9} comprising of the vectors passed by the a parity nodes 
to repair node 1. Property [2] tells us that in these a vectors, the components ranging from (a + 1) to 
k constitute interference, and are hence aligned. Clearly, the same statement holds for every column in 
Fig. |9j Thus, the respective components across these columns are aligned. Since the generator matrices 



of the parity nodes are as in ( 80 1, the result follows. 



For the repair of a systematic node, a parity node passes a vector from the column-space of its generator 
matrix, i.e., the vector 7(™'^) passed by parity node m for repair of failed systematic node i can be written 
in the form: 

(m/) ^ Q{m) Qim/) /g2) 

for some a-length vector 9}^'^> . 

In the equivalent code obtained in ( [SO] ), a parity node simply stores the a vectors it passes to repair 
the first a systematic nodes. On the other hand, the vector passed to systematic node (., a + 1 < t < k, 
is a linear combination of these a vectors. The next property employs Property |3] to show that every 
coefficient in this linear combination is non-zero. 

Property 5 (Code structure - what is passed): For d <2k — 1, and a helper parity node m assisting a 
failed systematic node £ 

(a) For £ G {1, ... , a}, ^("'^^ = e^, and 

(b) For £ G {a + 1, . . . , k}, every element of O}"^'^' is non-zero. 

Proof: Part (a) is a simple consequence of the structure of the code. We will prove part (b) by 
contradiction. Suppose 9a ' = 0, for some i G {a + 1, . . . ,k}. Then 7^™'^) is a linear combination of 
only the first {a — 1) columns of G^'^K This implies, 

7(™'^) C colspace y*"'^) ■ ■ ■ ^("'"-i)] . (83) 

This clearly violates Property [3} thus leading to a contradiction. ■ 

E. Proof of Non-existence 

We now present the main theorem of this section, namely, the non-achievability proof. The proof, in 
essence, shows that the conditions of Interference Alignment necessary for exact-repair of systematic 
nodes, coupled with the MDS property of the code, over-constrain the system, leading to ahgnment in 
the desired components as well. 

We begin with a toy example that will serve to illustrate the proof technique. Consider the case when 
[n = 7, fc = 5, d = 6]. Then it follows from (|5]) that {a = d — k + l = 2, B = ka = 10). In this case, as 
depicted in Figure [10} in the vectors passed by parity nodes 6 and 7, (a) when repairing systematic node 
3, there is alignment in components 4 and 5, and (b) when repairing systematic node 4, there is alignment 
in component 5. It is shown that this, in turn, forces alignment in component 4 (desired component) 
during repair of node 4 which is in contradiction to the assertion of Property |2] with respect to the desired 
component being linearly independent. 
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Fig. 10: A toy-example, with parameters [n = 7, fc = 5, d — 6], to illustrate the proof of non-existence. 



Theorem 11: Linear, exact-repair MSR codes achieving the cut-set bound on the repair-bandwidth do 
not exist for d < 2k — 3 in the absence of symbol extension (i.e., when (3 = 1). 

Proof: Recall that achieving the cut-set bound on the repair bandwidth in the absence of symbol 
extension gives d = k — I + a. For the parameter regime d < 2k — 3 under consideration, we get 
k > a + 3. Furthermore, since a > 1 q} we have n>k + 2(sisn>d+l = k + a). Hence the system 
contains at least (a + 3) systematic nodes and at least two parity nodes. 

We use Property |4] to express the generator matrix of any parity node, say node m, in the form: 



Q{m) 



Mm) 
TT \{m) 



In this proof, we will use the notation A -< B io indicate that the matrices A and B are scalar multiples 
of each other, i.e., A = kB for some non-zero scalar k and write A -/i, B io indicate that matrices A and 
B are not scalar multiples of each other. 

We will restrict our attention to components (« + 2) and (a + 3). First, consider repair of systematic 
node (a + 1). By the interference alignment property. Property [2| 



7 



(fc+l,a+l) 
■a+2 



-< 7 



(fc+2,Q + l) 

a+2 



I.e., 



(^(fc+l) Q{k+l,a+l) 



-< GiV^'^ ^^^^+2'"+^) 



H^+, h.^!:!^^ e^'^'^-^'^ -< H^^, Ai%'U^'^'^'^-''^ 



'a+2 ^^a+2 



^(fe+l) g(k+l,a+l) 



, a('=+2) /](fc+2,Q+l) 



(84) 

(85) 
(86) 
(87) 



where, equation ( [87] ) uses the non-singularity of Ha+2 (which is a consequence of Property [T]). 

We will use the notation ©(*>*) to denote an (a x a) diagonal matrix, with the elements on its diagonal 
as the respective elements in (^*'*i . Observing that the matrices A;*^ are diagonal matrices, we rewrite 



equation (87) as 



^(^+l)Q(fc+l,Q+l) _^ ^(*=+2)Q(fe+2,a+l) 



(88) 



Similarly, alignment conditions on the (a + 3)th component in the vectors passed for repair of systematic 
node (a + 1) give 

^(^fc+^2)Q(fc+2,a+l) ^ Ai'V+3^^0('=+l'"+l), (89) 



*As discussed previously in Section I a = 1 corresponds to a trivial scalar MDS code; hence, we omit this case from consideration. 
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and those on the {a + 3)th component in the vectors passed for repair of systematic node {a + 2) give 



Observe that in equations ([88]), ([89]) and ([90]), matrices A* and B*^*'*) are non-singular, diagonal matri 



ces. As a consequence, a product (of the terms respective in the left and right sides) of equations ( 88 1, ( 89 1 



and ( [90] ), followed by a cancellation of common terms leads to: 

This is clearly in contradiction to Property [2] which mandates linear independence of the desired compo- 
nents in vectors passed for repair of systematic node (a + 2): 

H^+2Ai%''^9j'^''-^''> ^ H^^,Ai%'^9j'^''-^'\ (92) 

i.e, AiV2'^e(^+i'-+2) 7^ Ai'vt^0(^'+2."+2)^ (93) 



VII. Explicit Codes for d = k + 1 

In this section, we give an explicit MSR code construction for the parameter set [n, k, d = k + 1], 
capable of repairing any failed node with a repair bandwidth equal to that given by the cut-set bound. 
This parameter set is relevant since 

a) the total number of nodes n in the system can be arbitrary (and is not constrained to be equal to d + 1), 
making the code pertinent for real-world distributed storage systems where it is natural for the system 
to expand/shrink, 

b) A; + 1 is the smallest value of the parameter d that offers a reduction in repair bandwidth, making the 
code suitable for networks with low connectivity. 

The code is constructed for (3 = 1, i.e., the code does not employ any symbol extension. All subsequent 
discussion in this section will implicitly assume (3 = 1. 

For most values of the parameters [n, k, d], d = k + 1 falls under d < 2k — 3 regime, where we have 



shown (Section VI) that exact-repair is not possible. When repair is not exact, a nodal generator matrix is 
liable to change after a repair process. Thus, for the code construction presented in this section, we drop 
the global kernel viewpoint and refer directly to the symbols stored or passed. 

As a build up to the code construction, we first inspect the trivial case of d = k. In this case, the cut-set 
lower bound on repair bandwidth is given by 

d>k = B. (94) 

Thus the parameter regime d = k mandates the repair bandwidth to be no less than the file size B, and 
has the remaining parameters satisfying 

(a = 1, B = k). (95) 

An MSR code for these parameters is necessarily an [n, k] scalar MDS code. Thus, in this code, node i 
stores the symbol 

[p]u). (96) 



where m is a fc-length vector containing all the message symbols, and {rj}"^^ is a set of /c -length vectors 
such that any arbitrary k of the n vectors are linearly independent. Upon failure of a node, the replacement 
node can connect to any arbitrary d = k nodes and download one symbol each, thereby recovering the 
entire message from which the desired symbol can be extracted. 
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Fig. 1 1: Evolution of a node through multiple repairs in the MSR d = k + 1 code. 



When d = k + 1, the cut-set bound (|5]) gives 

{a = d-k + l = 2, B = ak = 2k). (97) 

Let the 2k message symbols be the elements of the 2A;-dimensional column vector 

Ml 

M2 

where u-^ and Ug are A;-length column vectors. In the case of d = k + 1, a code analogous to the d = k 
code would have node i storing the two symbols: 

P*wi, P'u2)- (98) 



Maintaining the code as in ( [98] ), after one or more node repairs, necessitates exact repair of any failed 
node. Since in this regime, exact-repair is not possible for most values of the parameters, we allow an 
auxiliary component in our code, as described below. 



In our construction, the symbols stored in the nodes are initialized as in ( |98| ). On repair of a failed 
node, the code allows for an auxiliary component in the second symbol. Thus, under this code, the two 
symbols stored in node i, 1 < i < n, are 



(i^i^ 



r]u 



.). 



V 

Exact component 



li^2 + 

Auxiliary component 



(99) 



where r^ is a /c-length vector corresponding to the auxiliary component. Further, the value of r^ may 
alter when node i undergoes repair. Hence we term this repair process as approximately-exact-repair. 
For a better understanding, the system can be viewed as analogous to a Z-channel; this is depicted in 



Fig. 1 1 where the evolution of a node through successive repair operations is shown. In the latter half of 
this section, we will see that the set of vectors {rj}"=i do not, at any point in time, influence either the 
reconstruction or the repair process. 

We now proceed to a formal description of the code construction. 



A. Code Construction: 

Let {p.}"^i be a set of /c-length vectors such that any arbitrary k of the n vectors are linearly independent. 
Further, let {]ij}^^i be a set of A;-length vectors initialized to arbitrary values. Unlike {p.}, the vectors 
{rj} do not play a role either in reconstruction or in repair. In our code, node i stores the two symbols: 



(P* Ml, P*M2+dMl 



(100) 
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4ui + 5u2 + 3u3 + U4 + Ug 



3ui + 6U2 + U3 + U4 + 7U5 



3ui + 7u2 + 8U3 + 3U4T4UJ; ; 



U6 + U3+2U4 + 2U5 



U7 + 2ui + U3+ U4+ U5 



Ugt IOU4 



Ug + U^ + 2U2+ U3+ U5 



4U6 + SUy + 3U8 + U9 + U10 



3u6 + 6U7 + Ug + U9 + 7Uio + U4 



: =3«^* 7tif*^niS;3li8+I4ujQ+ Ui + 4U3 



3ui + 7u2 + 8U3 + 3U4 + 4U5 



f 3u6 + 7u7 + Sug + 3U9 + 4uio 
+ 6ui + 2u2 + 4U3 + 7U4 + 9U5 



Exact 

Fig. 12: A sample MSR d — k + 1 code for the parameters [n = 8, k 
Also depicted is the repair of node 8, assisted by helper nodes 1 to 6. 
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2, 5 = 10), over Fn. 



Upon failure of a node, the exact component, as the name suggests, is exactly repaired. However, the 
auxiliary component may undergo a change. The net effect is what we term as approximately -exact-repair. 

The code is defined over the finite field Fg of size q. The sole restriction on q comes from the construction 
of the set of vectors {rjjJLj^ such that every subset of k vectors are linearly independent. For instance, 
these vectors can be chosen from the rows of an {n x k) Vandermonde matrix or an [n x A;) Cauchy 
matrix, in which case any finite field of size q>n or q>n + k respectively will suffice. 

Example: Fig. [12] depicts a sample code construction over Fn for the parameters [n = 8, k = 5, d = 6] 
with /3 = 1 giving (a = 2, B = 10). Here, 
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The two theorems below show that the code described above is an [n, k, d = k 
establishing respectively, the reconstruction and the repair properties of the code. 



1] MSR code by 



Theorem 12 (Reconstruction, i.e., MDS property): In the code presented, all the B message symbols 
can be recovered by a data-collector connecting to any arbitrary k nodes. 

Proof: Due to symmetry we assume (without loss of generality) that the data-collector connects to 
the first k nodes. Then the data-collector obtains access to the 2k symbols stored in the first k nodes: 



E-Ml, P*M2 + ^i^l,.^^ 



(101) 



By construction, the vectors {p.}'^^^ are linearly independent, allowing the data-collector to recover the 
first message vector u^. Next, the data-collector subtracts the effect of Ui from the second term. Finally, 
in a manner analogous to the decoding of u^, the data-collector recovers the second message vector Ug. 
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Theorem 13 (Node repair): In the code presented, approximately exact-repair of any failed node can 
be achieved by connecting to an arbitrary subset of d (= A; + 1) of the remaining [n — 1) nodes. 

Proof: Due to symmetry, it suffices to consider the case where helper nodes {1, . . . , A; + 1} assist in 
the repair of another failed node /. The two symbols stored in node / prior to failure are 



P*^Mi, P^fU2 + r}uij 



However, since repair is guaranteed to be only approximately exact, it suffices for the replacement node 
to obtain 

where f^ is an arbitrary vector that need not be identical to Vj. 

The helper nodes {1, . . . , A; + 1} pass one symbol each, formed by a linear combination of the symbols 
stored in them. More specifically, helper node i, 1 < i < k + 1, under our repair algorithm, passes the 
symbol 

A«(p*Mi) + (p*M2 + dMi). (102) 

We introduce some notation at this point. For i E {k, k + 1}, let Pi be a (£ x k) matrix comprising 
of the vectors p , . . . ,jy, as its i rows respectively. Let Ri be a second {i x k) matrix comprising of the 
vectors r^, . . . ,r^ as its i rows respectively. Further, let A^ = diagjAi, . . . , A^} be an {i x i) diagonal 
matrix. In terms of these matrices, the k + 1 symbols obtained by the replacement node can be written 
as the (k + 1) -length vector 

{Ak+iPk+i + Rk+i) ui + (Pk+i) Va ■ (103) 

The precise values of the scalars {\i\\l\ are derived below. 

Recovery of the First Symbol: Let p be the linear combination of the received symbols that the 
replacement node takes to recover the first symbol that was stored in the failed node, i.e., we need 

p* ((Afc+iPfc+i + Rk+i) ui + (Pk+i) U2) = P) Ml- (104) 

This requires elimination of U2^ i.e., we need 

p'Pk+i = 0*. (105) 

To accomplish this, we first choose 

P = 



Pi 
-1 



(106) 



and in order to satisfy equation ( |105[ ), we set 

J t n-l 



Pi=EI+i^^ • (107) 



Note that the (k x k) matrix Pk is non-singular by construction. 
Now as U2 is eliminated, to obtain p* u-^^, we need 



p' (Ak+iPk+i + Rk+i) = p'f (108) 

=> p^^{AkPk + Rk) = p'f+{h+iP^k+i+ ^Ui)- (109) 



Choosing A^+i = and substituting the value of p from equation ( |107[ ), a few straightforward manipu 
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lations yield that choosing 
Afc = ( diag 



Ik+l ^k 



-1 



diag 



V 



f 



P 



k+l 



Pu' Rk + 



-fe+i 






(110) 



satisfies equation (109), thereby enabling the replacement node to exactly recover the first symbol. The 



non- singularity of the matrix diag 



I^Ui n" 



used here is justified as follows. Consider 



I^Ui Pk 



-1 



Pk 



P 



k+l 



(111) 



Now, if any element of 



r.-.! Pk 



-1 



is zero, it would imply that a linear combination of [k — 1) rows 



of Pfc can yield p . However, this contradicts the linear independence of every subset of k vectors in 

Recovery of the Second Symbol: Since the scalars {\i}\^l have already been utilized in the exact 
recovery of the first symbol, we are left with fewer degrees of freedom. This, in turn, gives rise to the 
presence of an auxiliary term in the second symbol. 

Let 5_ be the linear combination of the received symbols, that the replacement node takes, to obtain its 
second symbol (p* ^2 +f*/- Mi)^ i-^-^ we need 



5* ((Afe+iPfc+i + Rk+i) Ui + (-Pfc+i) %) = iff U2+ 4 Ml- 



(112) 



Since the vector r^ is allowed to take any arbitrary value, the condition in ( |112| ) is reduced to the 
requirement 



5'Pk+i = P 



f 



To accomplish this, we first choose 



5i 




where, in order to satisfy equation (113), we choose 



xt — „< p-i 



(113) 
(114) 

(115) 



In the example provided in Fig. 12, node 8 is repaired by downloading one symbol each from nodes 1 
to 6. The linear combination coefficients used by the helper nodes are: 

[Ai • • • Ae] = [6 1 3 3 1 0] . 

The replacement node retains the exact part, and obtains a different auxiliary part, with rg = [6 2 4 7 9] . 

VIIL Conclusions 

This paper considers the problem of constructing MDS regenerating codes achieving the cut-set bound 
on repair bandwidth, and presents four major results. First, the construction of an explicit code, termed 
the MISER code, that is capable of performing data reconstruction as well as optimal exact-repair of 
the systematic nodes, is presented. The construction is based on the concept of interference alignment. 
Second, we show that interference alignment is, in fact, necessary to enable exact-repair in an MSR code. 
Thirdly, using the necessity of interference alignment as a stepping stone, several properties that every 
exact-repair MSR code must possess, are derived. It is then shown that these properties over-constrain 
the system in the absence of symbol extension for (i < 2/i; — 3, leading to the non-existence of any linear, 
exact-repair MSR code in this regime. Finally, an explicit MSR code for rf = A; + 1, suited for networks 
with low connectivity, is presented. This is the first explicit code in the regenerating codes literature that 
does not impose any restriction on the total number of nodes n in the system. 
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Appendix 
Proof of Theorem [31 Reconstruction in the MISER Code 

Proof: The reconstruction property is equivalent to showing that the {B x B) matrix, obtained by 
columnwise concatenation of the generator matrices of the k nodes to which the data-collector connects, is 
non-singular. We denote this {B x B) matrix by Di. The proof proceeds via a series of linear, elementary 
row and column transformations of Di, obtaining new {B x B) matrices at each intermediate step, and 
the non-singularity of the matrix obtained at the end of this process will establish the non-singularity of 
A. 

Since we need to employ a substantial amount of notation here, we will make the connection between 



any notation that we introduce here with the notation employed in example presented in Section V-A 



This example provided the MISER code construction for the case A; = a = 3, with the scalar selection 



e = 2; we will track the case of reconstruction (Section V-A3 case (d)) when the data-collector connects 



to the first systematic node (node 1), and the first two parity nodes (nodes 4 and 5). 
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Let 6i, . . . ,6phe the p parity nodes to which the data-collector connects. Let ui, . . . , tUk-p (ui < ■ ■ ■ < 
ujk-p) be the k—p systematic nodes to which the data-collector connects, and Vti, . . . ,Vt.p (Vti < ■ ■ ■ < Vtp) 
be the p systematic nodes to which it does not connect. In terms of this notation, the matrix Di is 



D^ = I'g'-'^^-' ■ ■ ■ G^'^'^'P^ G*-^^-* ■ ■ ■ G*-^^-*] 



(116) 



Clearly, the sets {coi, . . . , tUk-p} and {fii, . . . , Up} are disjoint. In the example, the notation corresponds 

to p = 2, 5i = 4, ^2 = 5, wi = 1, fii = 2 and ^2 = 3. 

Since the data-collector can directly obtain the {k — p)a symbols stored in the k — p systematic nodes it 
connects to, the corresponding components, i.e., components coi, . . . ,cok-p, are eliminated from Di. Now, 
reconstruction is possible if the {pa x pa) matrix D2 is non-singular, where D2 is given by 



D, 



[G 



/{Sl) /~(/(52) 



G 



G' 

(5i) M^^) 



Q/(5p)l 



^1 



G 



Qi 



G. 



iSp)- 



/-y(^i) (^(^2) 



G, 



(5p) 



(117) 



The (6 X 6) matrix Bi in the example corresponds to the matrix D2 here. 

The remaining proof uses certain matrices having specific structure. These matrices are defined in 
Table |l} along with their values in the case of the example. 

TABLE I: Notation: Matrices used in the Proof of Theorem |3 



Matrix 


Dimension 


Value 


In the Example 










^f ^f 




S 


a X p 


[Sh = 4''^ V^,j 


S = 


4'^ 4f 
4'^ 4i^ 




S 


p X p 


[Skj=^S^ V*,j 


S = 


4'' 4^' 
4'' 4i' 


= ^2 


Ta,b 


a X p 


a*'* row as [V^f^J • • • "^o^ ]' ^^^ other elements 


Tl,2 = 


-4^^ 4^' 






Ta,b 


p X p 


a*'^ row as [ipn ■ ■ ■ "^o ]> ^11 other elements 


Tl,2 = 















■ 1 " 




Ea,b 


a X p 


Element at position (a, b) as 1, all other elements 


-E'1,2 = 








Ea,b 


p X p 


Element at position (a, b) as 1, all other elements 


-E'1,2 = 


1 






Note first that S, being a sub-matrix of the Cauchy matrix \E', is non-singular. Further, note the following 
relations between the matrices: 

T.,S-'=E.b (118) 



and 



Tab S 



-1 



E. 



a,b ■ 



We begin by permuting the columns of D2. Group the Vtith columns of {C 



(S,n) 



m 



(119) 
1, ... ,p} as the 
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first p columns of D^, followed by VL2th. columns of {G 



l{&m) 



m = 1, . . . ,p} as the next p columns, and 
so on. Thus, column number Vli of G''^'^'"^ moves to the position p x (i — 1) + m. Next, group the With 
columns of {G'*^ ""' | m = 1, . . . ,p} and append this group to the already permuted columns, followed by 



(<5m) 



-m. 



the (X'2th columns, and so on. Thus, column number Ui of G'^ ""' moves to the position p +p x 

Let D^ be the {pa x pa) matrix obtained after these permutations. The (6 x 6) matrix B2 in the example, 

corresponds to the matrix Dj, here. 

Next, we note that there are a groups with p columns each in D3. The component- wise grouping of the 
rows in the parent matrix D2 induces a natural grouping in 1^3, with its rows grouped into p groups of a 
rows each. Thus D^ can be viewed as a block matrix, with each block of size a x p, and the dimension 
of D2, being px a blocks. Now, in terms of the matrices defined in Table |I} the matrix D^ can be written 
as 



D. 



eS T( 



^^1,2 



^2,1 
tS 



i iri 






-^k- 



T, 



^k- 



. Tq^i 



T 



eS T 



LUl,p 



T. 



i^k~p,p J 



(120) 



Next, as the data collector can perform any linear operation on the columns of D^, we multiply the last 
[k—p] block-columns (i.e., blocks of p columns each) in D-^ by S^^ (while leaving the other block-columns 



unchanged). Using equation (118), the resulting pa x pa matrix is 



D, 



eS T< 



^2,1 



Tn^,2 ^S 



i ip 



E, 



^1,1 



OJl, 



E, 
E, 



'wfe-p,i 



t^fe- 



-'■ClitP -'f22,p ' ' ' ^'-' ^ijj-i,p ' ' ' -C'tJfe_p,p 



(121) 



The (6 X 6) matrix B3 in the example, corresponds to the matrix D^ here. 

Observe that in the block-columns ranging from p + 1 to a of the matrix D4, every individual column 
has exactly one non-zero element. The message symbols associated to these columns of D4 are now 
available to the data-collector and their effect on the rest of the encoded symbols can be subtracted out 
to get the following {p^ x p^) matrix 



D. 



eS 


T2,l 


Tl,2 


eS 


Ti,p 


T2,p 



T 

-J- 7 



^p,2 



eS 



(122) 



The matrix D^ here, is the (4 x 4) matrix B4 in the example. This is equivalent to reconstruction in the 
MISER code with the parameter k equal to p when a data-collector is attempting data recovery from the 
p parity nodes. Hence, general decoding algorithms for data collection from the parity nodes alone can 
also be applied, as in the present case, where data collection is done partially from systematic nodes and 
partially from parity nodes. The decoding procedure for this case is provided below. 



In the example detailed in case (c) of Section V-A3 where the data-collector connects to all three parity 



nodes, is related to this general case with p = 3, S = ^^ and D^ = C2. We will track this case in the 
sequel. 



The data-collector multiplies each of the p block-columns in Z^s by S* . From equation (119), the 



resultant [j? x p^) matrix is 



D. 



^Jp -^2,1 -^3,1 
-^1,2 ^Ip -^3,2 



El^p E; 



P -^2,p 



E^ 



3,p 
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Ep^l 
Ep^2 



eL 



(123) 



The (9 X 9) matrix C3 in the example, corresponds to the matrix Dq here. 



For i = 1, . . . ,p, the ith column in the ith block-column contains exactly one non-zero element (which 
is in the ith row of the ith block-row). It is evident that message symbols corresponding to these columns 
are now available to the data-collector, and their effect can be subtracted from the remaining symbols. 
This intermediate matrix corresponds to the (6 x 6) matrix C4 in the example. Next we rearrange the 
resulting matrix by first placing the ith column of the jth block-column adjacent to the jth column of the 
ith block-column and repeating the same procedure for rows to get a {{p"^ — p) x (p^ — p)) matrix Dj as 
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This is a block diagonal matrix which is non-singular since e^ 7^ 1. Thus the remaining message symbols 
can be recovered by decoding them in pairs. 



